Skip to content

Conversation

@leo-pony
Copy link
Contributor

@leo-pony leo-pony commented Oct 21, 2024

  1. Dynamically loadable backends framework has been added in PR(ggml-backend : add device and backend reg interfaces #9707). Currently, most backends have implemented these interfaces. CANN backend needs to adapt to this mechanism.
    The corresponding Feature Request PR is Feature Request: [CANN] backend adapts to llama.cpp dynamic backend loading mechanism #9862.
  2. Fix the bug:"Inference running result is garbled in debug running model for LM models whose type is Q4_0 class". Issue number Bug: [CANN] inference running result is garbled in debug running model for LM models who's type is Q4_0 class #9979

Function is normal:
image
Performance is the same with before:
image

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 21, 2024
Copy link
Member

@slaren slaren Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the instances of GGML_USE_CANN should be removed from this file, there are still a few remaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I have removed all the instances of GGML_USE_CANN from llama.cpp.

Comment on lines 591 to 598

#ifdef GGML_USE_AMX
register_backend(ggml_backend_amx_reg());
#endif

// TODO: kompute, cann
#ifdef GGML_USE_CANN
register_backend(ggml_backend_cann_reg());
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add extra lines, use the same format as the rest of the backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I have removed the blank lines and the formatting is consistent with the rest of the backends. Function is normal after removing.

* @brief Maximum number of CANN devices supported.
*/
#define GGML_CANN_MAX_DEVICES 16
#define GGML_CANN_NAME "CANN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this macro was taken from the implementation in the CUDA backend, but the reason it exists there is because the same code is used to build the ROCm and MUSA backends, so the name of the backend may change depending on the build flags, but I don't think this is necessary in the CANN backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, this macro is no longer exposed to llama. I have moved this macro to the cann implementation file.

@leo-pony leo-pony force-pushed the cann_adapt_backend_dynamically_loadable branch from 6d36f48 to abf5be4 Compare October 22, 2024 02:27
@hipudding hipudding added the Ascend NPU issues specific to Ascend NPUs label Oct 22, 2024
@hipudding
Copy link
Collaborator

The review comment has been fixed. Looks good to me. Approved

@hipudding hipudding merged commit 6b84473 into ggml-org:master Oct 22, 2024
53 checks passed
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request

build passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: [CANN] inference running result is garbled in debug running model for LM models who's type is Q4_0 class

3 participants